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ABSTRACT 


Three gradient-type procedures for unconstrained minimization are 
suggested. These procedures are hybrids between steepest descent and 
conjugate gradient algorithms, employing a design parameter to achieve 
^'adaptive sequence breaking. Essential convergence theory is presented 
in a unified fashion, and limited computational results are included to 
verify the efficacy of the form of the procedures. The computational 
results suggest that a normalized form of the Fletcher-Reeves algorithm 
is preferable to the original form. 


I. INTRODUCTION 


For the minimization of unconstrained functions, computational evidence 
suggests that algorithms which combine features of steepest descent and 
conjugate gradient algorithms may be effective. For example, it is well 
known that the Fletcher-Pr ell algorithm [1] often performs much worse 
then steepest descent far from a minimum. In common with most conjugate 
gradient methods, this behavior is usually countered by incorporating periodic 
steepest descent steps (sequence breaking) in the algorithm, leading to a 
hybrid algorithm. Many other possibilities for hybrid algorithms exist. 
Unfortunately, at the present time the design of good algorithms of this 
type is at best ad hoc, often based on heuristic results not amenable to a 
clear theoretical statement. Ve know why steepest descent works; we know 
it converges slowly near a minimum. We know how and why conjugate gradient 
methods work for quadratic functions; we know why they exhibit rapid final 
convergence for general functions. We do not know what overall improvement 
in convergence might be possible by an appropriate interleaving, or modi- 
fication, of these techniques. 

This paper does not purport to rectify the preceding situation. Rather, 
in recognition of the wide latitude with which search directions compatible 
with convergence may be chosen, several algorithms of comparable complexity 

i 

are suggested. These algorithms employ a design parameter to produce adaptive 
sequence breaking. Convergence theory applicable to a wide class of algorithms 
is developed in conjunction with the algorithms. 

Limited computational results are presented, including comparative 
results on steepest descent and standard conjugate gradient methods. 

These results confirm tt*at non-conjugate gradient techniques can manifest 
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quit® good behavior for functions which are not quadratic. C*i.? wethod 
tested, a normalized version of the Fletcher-Reeves conjugate gradient 
algorithm [2], but requiring storage of one less number, was found to 
converge faster than the standard Fletcher-Reeves method. Although 
theoretically impossible, this is apparently due to compatibility of the 
normalized version with the Davidon search procedure [3,1] employed to 
obtain a minimum (approximately) in a given search direction. The normalized 
form would appear to be preferable to the original form of the Fletcher- 
Reeves algorithm. 


II. ALGORITHMS: DESCRIPTIONS AND THEORY 

Let f : R fl + R be a continuously differentiable function. We are 

concerned here with algorithms designed to locate stationary points of f, 

i.e. , points x* such that g(x*) = 0, where g: R n -*■ R n is the gradient of f. 

2 2 

Under additional assumptions, such as f convex, or (3 /3x )f(x*) positive 
definite, x* Is a global minimum or a local minimum respectively. The 
algorithms all take the following standard form. 

Basic iterative form Given arbitrary, compute the sequence x^, x^, ... 
by the steps: 

(i) If g(x^) = g.^ l* 0, choose a feasible direction such that 

t 

< g i , ?*> < 0. 


(ii) Compute x 
f(x 


i+1 

i+1 


) 


such that 

- £ < x i+ i + x ±Pi> 


min f(x, + Ip .) 
X>0 


It follows that < f(x i > and 
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Cl) 


<P 1# ^ * 0 for i ■ 0, 1) 2} « 


The various algorithms differ only In the feasible search directions chosen. 
Algorithm k 1 (3) Let 3 € (0,1], let g Q + 0, and for 1> 1, let - gj-g^* 
^ 8 i*®i-l^ 

Pi - - 6^! + — Tjy s t * Then » set P 0 * “ g Q» and for 1 * 1. 2 . ••• 

I 8 *! I 


( 2 ) 


Pi « l|Pill 2 > 8 || 8l 


- g^ otherwise. 


Proposition 1 The algorithm A^ (3), when specialized to the quadratic 
function, x + Cx, Qx ), Q symmetric positive-definite, satisfies 


(3) <p i+l» Qp i ^ ” 0 for 1 * °» 3 » 2 * 

Proof : Trivial; < s i+1 » P^ +1 >=0, and s i+1 » XjQp^* X.^ > 0. 

Proposition (1) indicates that if * p i+l* t ^ ien P i+i and are 

Q-conjugate. However the relation ( p^, Qp^ ) » 0, i < j, is not satisfied 
even if “ p£ f° r k = i+1, . .., j. The method is not a conjugate gradient 
.method. 

Proposition 2 For all i, the algorithm A^(3) satisfies 



Proof : Relation (4) is clearly true if p.^ » - g^. 

Suppose p i * p^. Noting that < P^s^ > = 0, yields 
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Bat, by (1), ||p,|| 2 - Il 8 i _il | 2 ~ * i hence <p,,g i > - - Hp,|| 

I |s i ! | Z 11 1 

Implying | |p | 2 ,< HgJ) 2 , and (4) follows. 

Proposition (2) , along with similar relations for the following 
algorithms, will later be invoked to establish convergence to stationary 
points. 

Algorithm AjCB) (Modified Fletcher-Reeves) 


Let 0 e [0,1], let g Q ^ 0, and for i >. 1, let 


p i’- g i + 


llgjl 


i-1 


2 e i-1 


Then, set Pq = - g^, and for i = 1, 2, ...» set 


(5) 


P, «< 


0 i P i if e ItPiH 1 1 1 S ± [ I > 3 ± arbitrary in [0,1] 


g^ otherwise. 


Proposition 3 The algorithm A2 (B), when specialized to the quadratic 
function, x ■* ( x, Qx > , Q symmetric positive definite, satisfies 


( 6 ) 


<P i+l* Qp l > - 0 « Pi - - 8 ± . 


Proposition (4) indicates that algorithm A^CB) takes a Q-conjugate 
step following a steepest descent step. It is not a conjugate gradient 


method . 



Proposition 4 For all 1, the algorithm AgCft) satisfies 

< v 8 ^ - ' * IUJ ! 2 

Proof : If * - g^, (7) is satisfied. Thus, suppose p^ * 0jp| 


<P i .g 1 > - “ e i l|g 1 M 2 < - 8 ||g 1 l| 2 by (X), (5). 

b 2 I|. 1 || 2 - e 2 e 2 ||pjj| 2 < s 2 ||pjj| 2 < ||g i || 2 , by (s>. 

Algorithm A^(3) (Normalized Fletcher-Reeves) 

List 3 € [0,1], let gQ ? 0, and for 1 £ 1, let 




i i p i-ii i 2 + 11*1.11' 


Then set Pq * " 8q ^ or * * 1> 2, ... set 



i Kir 

i + 7— 7,2 p i-l> u 6 ± >B 

; ll p i-ill 


f 


B t < - g, + 


g. otherwise. 


Proposition 5 The algorithm A^(0) is equivalent to the Fletcher 
algorithm defined by Pq * - gg, 


(9) 



for i > 1 


Then, 


Reeves 
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Proof : Equivalence of (9) and A^(0) follows if we can establish that 

and are collnear. Implying that (9) and (8) will lead to the same 
sequence x^, x^, .... . In particular we shall establish that 


(10) 


P i- 


IM 

i|p;ll 


2 P l for 1 


0, 1, 


Clearly (10) holds for 1*0; suppose then that (10) holds for 1 ■ 0, 1, . . . , k. 


8 


k+1 


Vl “ B k+l ( ~ g k+l + i i 1 1 2 P k ) e k ( " 8 k+l + 


Mb^II 


pic 


8 , 


— p£> . by (10) 


But, again using (10), 


8 . 


k+1 


iigfcjr 


iWr + ii«wiii 


n^iii 2 + 


n-wiii 


8i 


.|i 2 

Pk 1 


8 


k+1 


p 1 1 2 

k+1 


, hence 


lignin , 

^' iIp^ii 2 Pk+1 


The algorithm A^(0) Is thus a conjugate gradient method, by equivalence 
with the Fletcher-Reeves algorithm. Proof that (9) defines a conjugate 
gradient method may be found in [4]. 

Proposition 6 For all 1, the algorithm A^($) satisfies 
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( 11 ) - l| gl ll 2 < ■- ll Pl ll 2 <- 6 IlgJI 2 

Proof : Relation (11) is trivial if p^ ■ - g^ thus assume 


p i “ " 8 i 


Ikl 


IIp^II 


^ p 2 _ 2 ) • Then, * ^i.1 l®£ 


by ( 1 ) 


But|b 1 l| 2 -^||g 1 t | 2 + jA _ 2 


Pi-1 


&i IlgJ) 2 . hence <p i ,g i > * - || Pi || 2 , and (11) follows. 


Convergence of the algorithms To prove convergence of the preceding 
algorithms, in an appropriate sense, we shall state a version of a general 
convergence theorem, essentially drawn from [5]. 

Let f : R n + R be continuously differentiable, and let x -► A(x) be 
a point- to-set mapping such that if x is not stationary, there exists 
6 * 6 GO > 0, and e « e(x) > 0, satisfying 


(12) sup f(y) £ f(x f ) - 6 for all | jx*-x| j ^ e. 

y€A(x*) 


Given any x^, set i = 0, 
to the algorithm 

f Step 1 - 


(13) 


and construct the sequence x^, x^, ... according 

If x^ stationary, stop* 

If x^ not stationary, choose €: A(x^) 


1 Step 2 Set i =* i+1, and go to step 1. 
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T heorem 1 If the sequence x^, x^, ... is generated according to the 
algorithm (13) , then either the sequence is finite and the last point is 
stationary, or the sequence is infinite, and any accumulation point is 
stationary. 

Remark 1 If the sequence is infinite, accumulation points need not exist. 

Any assumption which ensures that the computation is carried out in a 
bounded set ensures that a stationary point will be found. 

The proof of theorem (1) is trivial for the finite case, and a 
straightforward consequence of continuity for the infinite case. 

We now prove convergence, in the sense of theorem (1) , for the algorithms 
A^(8)» A^CP) i A^($)» with 3 €= (0,1]. We do this by constructing a set of 
search directions, compatible with convergence, and rich enough to include 
the directions specified by the preceding algorithms. Towards this end, 
for any y € (0,1], and any M > 0, we define the set 

(14) P ^ M (x) = tp| | |p| I < M| | g (x) | | , <p,g(x)> ^ - y| |g(x) | | 2 ) 
and the point- to-set mapping ^ by 

(15) A (x) * {y|y * x + A(x,p)p, where 

Y»m 

f(y) - min f(x + Ap), P € p (x)} 

A>0 Y,fl 

Lemma 1 Let x an arbitrary point, not stationary, let y € (0,1], and 
M > 0. Then there exists d * 6(x) >0, and e * e(x) > 0 such that 

(16) sup f(y) £ f(x') - 6 for | |x T - x|| 
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Theorem 2 If the sequence , x^, . . ; is generated according to the 
algorithm of the form (13) , using the mapping A^ M (15) , then either the 
sequence is finite and the last point is stationary, or the sequence is 
infinite, and any accumulation point is stationary. 

Proof of lemma (1) Let g(x) f 0; then there exists e > 0, a > 0 such 
that 


(17) 


o < | |g(x*) | | £ 2a for all | |x f - x| | < 2e 


;],(*•) - g(x")|| < ^ for all ||x' - x|| + | |x" - x| | < 3e 


Let X ■ ; then for all | |x - x' | j £ e, and for all p € P (»*) • 

| |x* + Xp - x| | £ 2e. Thus f(x f + Xp) * f(x') + X <p,g(x* + r|Xp)> , 
o < T1 < 1» (mean-value theorem) 

* f(x') + X<p,g(x')> + X <p,g(x* + nXp) - g(x*)> 

< f(x’) - Tyl jg<x’> 1 1 2 + x||p|| m b y (i4), ci7) 

< f(x') - Tya 2 + X(M2a) m by (14), (17) 

- f(x') - 6 , «-|Xya 2 . 

Hence, for all | |x* - xj| £ e, and for all ? £ ^(x'), 


f(x* + X(x',p)p) £ f(x' +Xp) £ f (x* ) - <5, i.e., (16) holds, and the proof 
is complete. 

Convergence of the algorithms A^(g) , A 2 (g) and A^(g), with g € (0,1], 
now follows directly from theorem (2), and propositions (2), (4), and (6). 
For A^(g), proposition (2) yields p i € 1 (x^) ; for A^fg), proposition 
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(4) yields e p 1 Uj); for A 3 (&) 5 proposition (6) yields p A € 1 (x i >. 


Finally, we note that these results include the steepest descent algorithm* 
■ - 3 1 » which is equivalent to A^(l) , A 2 (l) , or A^(l). 

Remark 2 A set P^ (x) compatible with convergence in the same sense as 
P Y»mOO is 


(18) P^(x) - (p| <p,g(x)> <-Y ||p|| I|g(x)||> 


Note that ^(x) c P y/M (x) : p y/m < x) c V x) i£ M - i; 

The parameter $ € (0*1] serves as a design parameter vhich governs 
how orthogonal we allow and g^ to become. If 3 is allowed to take the 
value 0, theorem (2) is no longer applicable. To prove convergence of 
the algorithm A^(0)» which corresponds to the Fletcher-Reeves algorithm 
without sequence breaking* we shall prove a convergence theorem motivated 
by some known results for the Fletcher-Reeves method. 

Theorem 3* Suppose that the sequence x^, x^, ... corresponds to a sequence 
Pq, p i* *** °f feasible search directions, and that Xq, x^, ... has a con- 
vergent subsequence x^ 1 , x^, T ... x*. If f is a twice continuously differ- 

entiable function, and if there exists a sequence Yq» Y^» ••• of positive scalars 


such that 
(19) 



i * 0 , 1 » • • • 


*The form which this theorem ought to take came to the author's attention 

through unpublished results of G. Ribiere of I.B.M. , Paris, France, and 

G. Zoutendijli of The University of Leiden, Netherlands, as communicated 
to the author by E. Polak, University of California, Berkeley. 
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(where the prime denotes the subsequence of interest) 

00 

( 20 ) £ “ °°> 

i-0 x 

then g(x*) * 0, i.e. t x* is a stationary puint. 

Proof : Assume that theorem (3) is false, i.e., g(x*) f 0. Then there 

exists e > 0, 5 > 0, y > 0, such that 


( 21 ) 


I |g( x ) 1 1 £ $ for *11 | |x - x*| | r < 2e 


( 22 ) 


1 1 H (x) 1 1 < a for all | |x - x*|| < 2e. (H(x) =* - — s- (x)) 

3x z 


Let h 


i IIpjII IUJI p i 


p , and consider the expansion 


f(x i ' + Xh ± «) * f(x i *) + x <g i , ,h i f > + y-<h i f , H(x i ' + nXfc i *)h ± t > , o < n < i 


< f(x ± f ) + X<g ± f ,h 1 , > +Y~ ||h ± |r 1 lH(x ± f + nXh ± f ) 


X 2 o 


< f(x 1 ') +X<g i , ,h 1 ’> + f- ^jby (19), (20), 

6 

providing | - x*| | e and X £ e<5 (which implies | |nXh^' 1 1 


= —TlX. 


8, 


sis*- 

Note that €S (0,1] by (19) ; thus if 0 < Z < 1 is chosen such that 
* *2 

Jl~ < e, then X^ * y^ satisfies X^ ^ for all i * 0, 1, .... Hence 


f- 

% 


\S 

f(x[ +i ) < f(x i * + X^’) < f(x ') - X iYi + 2 slnce 0 < Z <_ 1, 

2Za. 
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> 0, we find that 


flJt 4 

<g i , .h i , > < - Y ± by (19). Setting V - -jj- 

f<x' +1 ) <, fCx^*) - Vy 2 for all | l^c^ 1 “ **|| £ e» hence (20) implies 

fCx^*) -*■-«>, which is a contradiction (ffr^*) f(x*)> - ®). This 

completes the proof. 

Remark 3 It is not known whether or not the assumption that f be twice- 
continously differentiable can be dispensed with in theorem (3) . Those 
conjugate gradient methods which have been proven to converge for general 
functions (without sequence breaking) also require at least as strong an 
assumption, as we see in the next theorem. (See also, the Polak-Ribiere 
algorithm [6]). 

Theorem 4 Suppose that the sequence Xq, x^, ... is constructed according 
to the algorithm A^(0) , and that f: R -► R is twice-contiuuously differentiable. 
If Xq, x^, ... converges to a point x*, then g(x*) = 0. 

Proof : Suppose g(x*) ^ 0; then there exists cc > 0 and I* such that 

a 1 1 S( x pi I ^ 2a for all i ^ i*. Without loss of generality, we shall 
assume 1* = 0. Now, by proposition (5) , A^(0) is equivalent to the 
Fletcher-Reeves algorithm (9) . Using (9) , (1) , we obtain 

(23) 

(24) 


<P i '*g i > * - IM 2 ' for all 1 


Pi ’" 2 ' 1|8 i' l 2+ Tirn? ||p i-i |12 

1 1 1 1 


118,11 


118,112 

l | 8,|| 2 


+ + 


lUill 


8/ 


I |8n| | 


4 


< I (gj J 1 2 4(i+l) for aU i. 
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Combining (23) and (24) , we obtain 


(25) < Pi'»gi > - * 1 1 1 1 I Ip^M* where y ± 2 - ^ J +1) 


00 


But y. * °°, and hence by theorem (3), g(x*) * 0 which contradicts 
i-0 1 

the assumption that g(x*) ? 0. Thus, theorem (4) is true. 

Remark 4 If in theorem (4) , it is only assumed that the sequence x^, 
x^, ... has a subsequence converging to x*, the proof breaks down. A 
convergence result for Fletcher-Reeves given in the literature [7] , which 
is stated for a subsequence, would appear to be in error because the effect 
of an assumed normalization was not adequately accounted for. 

The following result is intended to show that the conditions (19) 
and (20) of theorem (3) cannot be relaxed much. In particular, since (19) 


requires 0 < y < 1, ^ y/ 
1 i=0 


GO 


as go 


implies ^ y^ - «. This latter condition 


is sometimes believed to be sufficient for convergence. 

P roposition 7 The conditions “(p^g^) £ jjp^J) ||g.Jj, with 

03 

y ^ - <», do not imply that the corresponding sequence x^, x. , ... has 

i-0 1 v 

accumulation points which are stationary, even if the sequence is bounded, 

GO 

if Y^ = 00 holds for a convergent subsequence, and if f is twice continuously 
differentiable and strictly convex. 

Proof ; Let f : R + R be defined by f (x) - | |xj | , let x^ ^ 0, and let 
the p A satisfy 
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(26) - <P i .g 1 > - f IIpJI ||g 1 l| for i - 1, 2, ... 


where r 


2 1 


-1 


8^2^ psj * Hence \ * i for i * i* 2 » y± * °°* 

9 9 9 < p i »g i > 

But f U i+1 ) - I |x ± ir + 2X ± < + X i I IPil I » and X i “ ~ 2 y ields 


f (x i+ i> 


9 2 2 

x i" ’ n^iF ' £(Xl) '7 il8il1 • 


CO 2 

Thus, lim f(x ) - f(x ) - f ^ 1 1 Sj I I ^ 
i-*» A j-1 j J 


CO 2 \ 

I l*ll I 2 * *2 f (x l^ 51 °» since 

00 

I le i+1 i I 2 - 4 I |x i+1 l I 2 < 4 I |x ± | | 2 « | |g ± | I 2 for all i, and ^ ^2 “ F 

Thus, no subsequence of V ... can converge to a stationary point. 

Finally, we note that, while x^, x^, ... need not converge, appeal to the 

2 

situation for R indicates that the p^^ may be chosen so that x^, . . . 

2 

zig-zags to a single accumulation point (In particular, in R , specify 

additionally that <p. ,p.,,> < 0 for all i. The convergence may be proven 

i x+j. 

rigorously) . 

Since the function | ]x| | 2 has about as many nice properties as one 
may ask for, the conditions (19) and (20) cannot be significantly relaxed. 

If 3 is allowed to be zero in A 2 (B), the algorithm is not meaningful 
without a restriction on the $^. We obtain the following results for this 
case. 

Theorem 5 Suppose that x^, x^, ... is constructed according to the 

■k 

algorithm A£(0), with the additional restriction that inf $ > 0. 
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If the sequence x^, x^, ... converges to a point x*, then g(x*) * 0. 

Proof : Suppose that g(x*) # 0; then there exists ot > 0, i* £ 1, such that 

| |g(x t ) | | > a for all i £ i*. 

By (5) , (1) , we obtain 


< P i * g i > “ “ IUJI 2 , 
for all i > i* > 1. 



Now sup ||p. j j 2 * 00 iff inf 1 1 p . 1 1 ^ * 0, (recall that B. <_ 1) . But 
i>i* 1 i>i* 1 1 *" 

I |PjJ | 2 £ (3*) 2 a 2 > 0 for all i >_ i + 1, hence | |p^| | is bounded, say by 
a^M. Then 


(27) 

( P i »8 i > < “ 3*1 1 S ± 1 1 2 

i > i* 

(28) 

HpJI 2 < a 5 * < m I | g± | 

I 2 i>i*. 

i. e. , 

P i ^ P B*,M (x i** X i+1 € A B*,M (X i )f 

for all i > i*. 


By theorem (2), x^^ x* yields g(x*) = 0, which is a contradiction. 
Theorem (5) thus holds. 

Theorem 6 Suppose that Xq, x^, ... is constructed according to the 

algorithm A 9 (0), with the additional restrictions that ^ B. = 08 and 

i-0 

^i/%-1 b° un< * e< ** ff f is twice-continuously differentiable, and if the 
sequence x q» x ^» ••• converges to a point x*, then g(x*) * 0. 
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Proof ; Assume g(x*) t 0; then there exists a > 0, i* >_ 1 such that 


a £ I |g( x j) 1 1 £ 2a for all i £ i*. 


i _> i*, hence 



N ° w I lP i l | 2 > ( S ^) 2 a 

2 - 2 -J )< ta 2 + 16a 2 



< Ma 


2 


2 

for some M. i.e., j |p^| J Is bounded, and hence 

(29) <P i ,g i > - - S ± lUill 2 i>±* 

(30) I |P ± I I 2 < M lUiH 2 i>i* 

8 , 

Thus <p. ,g.) < 1 Ip. | 1 ||g. || for all i ^ i*, and by theorem (3), 

li l i 

we conclude that g(x*) 3 0; a contradiction. This completes the proof. 

Remark 5 The assumption that is bounded holds for any nonincreasing 

sequence of It avoids such strategies as choosing $^ = 1, i odd, 

$, * 1 even, etc. 

1 1 


III. NUMERICAL RESULTS 


Versions of the preceding algorithms were experimentally compared 
to steepest descent and standard conjugate gradient algorithms. For 
uniformity of test procedure, all comparisons were made with each method 
reverting to steepest descent after every n + 1 steps. For A2(3) , 
was chosen to be 1 for all i; the algorithm thus becomes 


(31) 




- g^ c therwise 




i f k(n+l) , k » 0, 1, ... 
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Note that the computational results for the preceding algorithms do not 
now depend on $; these results are only intended to indicate the efficacy 
of using search directions of the various forms. 

The iterations were stopped when ||g(x)|j was sufficiently small, 

1 1 g (ac) | f < e to ^ being the particular criterion. The linear search method 
programmed was that originally introduced by Davidon, [3] as detailed in 
Fletcher and Powell [1]. Briefly, the search method consists of bracketing 
a (local) minimum in an interval, and using (possibly repeated) cubic 
interpolation to approximate the minimum. 

The functions to which the algorithms were applied are 

(32) f^: R defined by x -► ( x - x*» Q(x - x*)> 

where Q is a symmetric positive definite matrix 

(33) f 2 (x) * exp (f ^ (x) ) + (x fi 2 -l)x 1 + 1 

(34) f g (x) * l °°( x 2 * + (1 * Xj) 2 (Rosenbrock, [8]). 


All the algorithms tested were coded - by the same person - in 
Fortran IV for execution on the CDC 6400 computer of the University of 
California Computing Center. Tables 1-3 summarize the results 


obtained. 


f x (x) = O.Kxj-1) 2 +(x 2 +0. 2) 2 +(x 3 -0. 3) 2 +5(x 4 +0. 4) 2 
* 0.2(Xj-0.5) 2 +. 4 (x ? +x g ) 2 +. 3 (x g -x g ) 2 +l. 5 (x g +x 1Q ) 2 
+ (x 7 +x 1Q ) 2 +( 2 x ± -1.2) 2 +x 6 2 +.5(x 5 +x 2 -x 3 ) 2 + 4(2x 5 ~x 1 +x g ) 2 
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t 


Remark 6 Algorithm A^ tended to display the same characteristics as 

steepest descent for functions f^ and f^> although the convergence was 

more rapid. However, for function ty A^ converged in fewer steps (and 

with higher accuracy) than all other methods tested. Although it is not 

clear why this occurred, the result for f^ may be misleading because 
2 

in R algorithm A^ behaves like a conjugate gradient method. 

Remark 7 Algorithm A^ performed surprisingly well. Reasonable convergence 
obtained for the quadratic function f^. For f ^ and f^ the algorithm 
performed as well or better than the similarly structured Fletcher-Reeves 
algorithm. Thus, the fact that is not a conjugate gradient method 
appears to be of no particular consequence when dealing with nonquadratic 
functions, providing the search directions are of comparable form to those 
selected by the Fletcher-Reeves algorithm. Methods to determine the best 
way to use the freedom to choose the would no doubt lead to better 
versions of the algorithm k,^. 

Remark 8 The algorithm A^» the nc v^lized Fletcher-Reeves algorithm, 
converged faster than the Fletcher-Reeves algorithm. This is theoretically 
impossible by Proposition (5). In practice it is due to the use of an 
approximate procedure to locate the minimum in a given search direction. 
Apparently, the normalized version is more compatible with the Davidon 
search technique, which utilizes ||p^|J in obtaining an initial estimate 
of the step length X^. Very likely, the relation | jp^j jj < |jg^|| for the 
normalized version (Proposition (6)), as compared to ||p^ f || > ||g^||» 

1^0, for (9) , is responsible for the improvement. For this reason, 
and because the parameter (8) (which measures the orthogonality of p^ 
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and gj) should be monitored during any computation; the normalized version 

of the Fletcher-Reeves algorithm appears to be preferable to the original 

2 

version. Note that the normalized version does not require IJg^^lI 
for the computation of p^ (8) . 

CONCLUSIONS 

The results of this paper confirm that algorithms which retain some 
of the features of conjugate gradient methods, and some of the features 
of steepest descent, can be quite effective. The global behavior of these 
methods is extremely difficult to predict. In fact, this behavior depends 
on a complex interplay between the choice of search directions and the 
choice of search procedure used. Perhaps when we can better characterize 
this interplay - or learn how to proceed without this convenient de- 
composition - the flexibility and power of computers can be more effectively 
utilized. 

Even when comparing relatively simple classes of algorithms, there are 
questions yet to be answered. How should we best utilize information on 
orthogonality between the gradient and the search direction? How many 
function-gradient evaluations per iteration should we expect (or tolerate) 
for a given choice of search direction-search procedure? The answer to 
questions such as these would appear to be essential to the comparison of 
algorithms within a framework that includes not only iterative quality, 
but the costs of computer implementation. 
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Algorithm 

Steps 

required 

Function-gradient 
eva luations 

Computation 
time (sec . )t 

Final 

value 

A i 

48 

1 

98 

.448 

2. OxlO -6 

A 2 

38 

78 

— 

3.9xl0 ~ 5 

A 3 

11 

24 

.108 

8.5xl0 ~ 9 

Steepest 

Descent 

>99 

>200 

>.82 

2 . 1 x 10” 3 

Fletcher-Reeves 

11 

24 


1.4xl0 ~ 10 

Fletcher-Powell 

10 

22 

— 

1.4x10 ” 28 

Modified 

Fletcher-Powell* 

10 

23 

— 

1 . 3xl0 ” 27 


TABLE 1 Function f 1 : starting point x^ = (.5, .5, . .5), f^Cx^) - 260, 
stopping criterion = 10 3 


T The computation times include the times for two redundant function-graident 
evaluations per iteration, which are not included in the count of function- 
gradient evaluations. 


The modified Fletcher-Powell method is described in 

T. 


H 


i +1 


H 4 S ., , -.s., . i H. 

„ i i +1 i +1 l 

* H i * T 


i+1 


,« s. r • where s i+i = s i+ i ' 8 i- 

1 x+l 


[ 4 ]; it uses 
The results for this 


algorithm, as well as for steepest descent, Fletcher-Reeves and Fletcher-Powell 
are drawn from a report by Nuytten [9], 
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Algorithm 

Steps 

required 

Function-gradient 

evaluations 

Computation 
time (sec.) 

Final 

value 

A i 

95 

420 

1.4 

.88 

A 2 

55 

216 

.74 

00 

00 

. 

A 3 

54 

208 

.74 

.88 

Steepest 

Descent 

309 

1672 

4.86 

.88 

Fletcher -Reeves 

59 

221 

.78 

00 

00 

Fletcher-Powell 

35 

109 

2.63 

.88 

Modified 

Fletcher-Powell 

65 

206 

3.11 

’ 

00 

00 

• 


TABLE 2 Function f^: starting point x^ * (- .5, - .5, ...» - .5), 

20 -5 

* ^-4x10 » stopping criterion e tol ■ 10 . 
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Algorithm 

Steps 

required 

Func tion-gradient 
evaluations 

Computation 
time (sec.) 

Final 

value 


23 

53 

0.16 

1 . 8x1 0~ 12 

A 2 

31 

71 

0.21 

2.4xl0 _1 ° 

A 3 

26 

60 

0.18 

2.9xl0~ 10 

Steepest 

Descent 

134 

290 

0.81 


Fletcher-Reeves 

31 

71 

0.21 

2.1xl0" 10 

Fletcher-Powell 

32 

72 

Q.25 

1.2x10" 10 

Modified 

Fletcher-Powell 

30 

63 

0.22 

3.1xl0" 8 


■TABLE 3 Function f^: starting point Xq * (- 1.2, 1.0), =s 240, 

•6 

stopping criterion e to ^ = 10 
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